Method and apparatus for data compression and decompression for a data processor system

ABSTRACT

During a compressing portion, memory ( 20 ) is divided into cache line blocks ( 500 ). Each cache line block is compressed and modified by replacing address destinations of address indirection instructions with compressed address destinations. Each cache line block is modified to have a flow indirection instruction as the last instruction in each cache line. The compressed cache line blocks ( 500 ) are stored in a memory ( 858 ). During a decompression portion, a cache line ( 500 ) is accessed based on an instruction pointer ( 902 ) value. The cache line is decompressed and stored in cache. The cache tag is determined based on the instruction pointer ( 902 ) value.

FIELD OF THE INVENTION

[0001] This invention relates generally to data compression, and moreparticularly, to a data compression for a microprocessor system having acache.

BACKGROUND OF THE INVENTION

[0002] Many modern technologies that use microprocessors ormicrocontrollers, such as hand-held electronic applications, requirehigh performance processing power combined with highly efficientimplementations to reduce system costs and space requirements. The useof instruction caches and data caches in order to improve performance iswell known in the industry. In an effort to further reduce system sizeand cost, it is known to compress instruction data to minimize theamount of memory a system will need. Before an instruction contained ina compressed memory can be used, the information contained within thatmemory must be decompressed in order for the target data processor toexecute.

[0003] A prior art method of handling the compression of data for use ina data processor system and the decompression of data for use by thatdata processor system uses the following steps: dividing theuncompressed program into separate cache blocks; compressing each cacheblock; and, compacting the individual compressed blocks into a memory.By breaking the program into individual cache blocks, where a cacheblock represents the number of words in each cache line, it is possibleto efficiently compress the data associated with each cache block. Sincemodern data processing systems generally load an entire cache line at atime, it is possible to fill an entire cache line efficiently by knowingthe starting address of a compressed cache block.

[0004] The prior art method requires the generation of a look-asidetable (LAT). The look-aside table keeps track of which compressedaddress relates to which cache tag of the data processor. When theinstruction pointer of the data processing system requires an addresswhich is not resident within the instruction cache, it is necessary forthe data processor system to determine where in compressed memory therequired information resides. This information is maintained in thelook-aside table stored in the system memory. When a cache miss occurs,the data processor system utilizes a cache refill engine to provide theappropriate information to the next available cache line. The cacherefill engine parses the LAT to correlate the new cache tag to thecompressed memory. This correlation describes the cache block address,in compressed memory, where the requested instruction resides. Oncedetermined, the compressed memory is accessed, decompressed, and used tofill the appropriate cache line. The cache line containing the newlystored information maintains the original address tag as determined bythe instruction pointer for its cache tag. The next time the instructionpointer requests information having the same address tag, a cache hitwill occur, indicating the data is in the cache, and processing willcontinue in a normal fashion, provided the cache line has not beencleared.

[0005] In order to reduce the overhead of the cache refill engine havingto search through the look-aside table in system memory, it is commonfor data processor systems to use a compressed cache look-aside bufferCLB. The CLB maintains a list of recently translated address tags andtheir corresponding address information in compressed memory. Bymaintaining an on-chip CLB, overhead associated with parsing the LAT isavoided.

[0006] A disadvantage of the prior art system is that it requires atranslation of the address tag into the appropriate compressed addresslocation. This is accomplished at the expense of providing andmaintaining a CLB, and increasing the complexity of the cache refillengine, which must search a LAT in order to determine the appropriatecompressed memory location to access. In addition, it is necessary toperform these functions each time a cache miss occurs. As a result, eachcache tag will be re-translated every time it is cleared out of thecache. Therefore, a method, and a data processor, that allows forexecution of compressed programs while limiting physical overhead andexecution time associated with translation is needed.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007]FIG. 1 illustrates, in flow diagram form, a method of compressionin accordance with the present invention.

[0008]FIG. 2 illustrates, in block diagram form, a memory area atdifferent stage of compression in accordance with the present invention.

[0009]FIG. 3 illustrates, in block diagram form, a data processingsystem in accordance the present invention.

[0010]FIG. 4 illustrates, in flow diagram form, a method ofdecompression in accordance with the present invention.

[0011]FIG. 5 illustrates, in block diagram form, a cache line.

[0012]FIG. 6 illustrates, in flow diagram form, a method of determiningif a fall through condition exists.

[0013]FIG. 7 illustrates, in flow diagram form, another method ofdecompression in accordance with the present invention.

[0014]FIG. 8 illustrates, in block diagram form, a detailed view acomputer processor from FIG. 2.

[0015]FIG. 9 illustrates, in block diagram form, a detailed view of thecomputer processor 22 of FIG. 8.

[0016]FIG. 10 illustrates, in flow diagram form, another method ofcompression in accordance with the present invention.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

[0017] Generally, the present invention provides a method and apparatusfor compressing instruction memory for use in a cache system, such thatthe amount of overhead associated with a data processor in terms of sizeand time of execution is minimized.

[0018] Known cached compression systems rely upon the use of look-asidetables (LAT) and compressed cache look-aside buffers (CLB) to decodesequentially addressed cache tags. For example, if an address tag for aninstructions is 1234, and the instruction is stored in the final word ofa cache line, the next instruction in series would have an address tag1235. Assuming the two instructions occur in sequence, a fall throughhappens, and a cache miss occurs. As a result, the (CLB) will bequeried, and ultimately a search through the (LAT) can occur in order toidentify the address of the compressed data location containing theneeded instruction. Once the address of the compressed location isidentified, the data stored there will be loaded into the cache line nowcontaining the cache tag 1235. Note, that there is no relationshipbetween the address tag 1235 and the address location identifying thebeginning of the compressed cache block other than the correlationprovided by the look-aside table (LAT). The present invention provides aone-to-one correlation between the address tag and the compressedmemory. This simplifies the steps of address translation when usingcompressed instructions.

[0019]FIG. 1 illustrates, in flow diagram form, a method 100 forcompressing computer instructions, such that the computer instructionsmay be stored in memory and accessed by a data processor without the useof a look-aside table or associated CLB. At step 101, pre-compressionsteps are completed. This will include such steps as compiling andlinking of source code. At step 110, the uncompressed code is dividedinto uncompressed cache line blocks. For, example, if a cache line holds16 words, this step could divide the uncompressed modified into 16 wordblocks, or less as needed. After step 110, and before compression, abranch or jump instruction would have a relative displacement or anabsolute address which would be used in determining the actual addressof the target instruction of the branch or jump instruction. In theprior art, this displacement or absolute address would be compressed.After decompression, it would contain the same displacement or addressthat it would have had before compression, and the (LAT) or (CLB) wouldbe used to find where the compressed code for the target instruction waslocated. In step 120 of the present invention, in contrast, thedisplacement or absolute address in the branch or jump instruction isreplaced by a transformed displacement or absolute address beforecompression. After compression and upon subsequent decompression, thistransformed address is to be quickly and unambiguously divisible intothe starting address of the compressed cache line in compressed memoryand the word offset identifying the instruction location within thecache line. The first time through or on subsequent iterations, theaddress of the compressed cache line containing the target instructionor the offset of the target instruction within that cache line may notbe known. On all except the final pass through step 120, the actualvalue is not needed. All that is needed is the number of bits which willbe required to encode the displacement or absolute value, since in allbut the last pass through step 120, the purpose of step 120 is merely todetermine how many bits will be needed to carry out the encoding foreach cache line. If the number of bits needed to encode the absoluteaddress or displacement is a monotonic non-decreasing function of themagnitude of the absolute address or displacement, it is easy to showthat step 120 and each of the other steps in FIG. 1 will only need to becarried out a finite number of times, which guarantees convergenceneeded for step 135 discussed below. In practice, the number ofiterations is likely to be small. If for a particular branch or jumpinstruction, the target instruction's compressed cache line address hasnot already been tentatively determined, (for example, a forward branchthe first time that step 120 is executed) the number of bits used forthe coding should be the minimum number of bits which is permitted foran absolute address or displacement in the particular coding methodchosen and the value of these bits is immaterial. Otherwise, thetransformed displacement or absolute adress should be computed using theaddress of the compressed cache line and offset of the targetinstruction and the number of bits needed to encode this transformedvalue should be used. Next, in step 130, each of the uncompressed cacheline blocks is compressed. In addition, it is understood that subsequentiterations of this step may not require complete recompression, asprevious compression information may be maintained. Actually, in all butthe last stage, only the number of bits and not the actual values of thecoded instructions need be determined. At Step 135, a determination ismade whether the value of any transformed field will need to berecalculated. This will be necessary if the coded value for anytransformed displacement or absolute address was not known the last timethat Step 120 was performed or if any displacement could have changed.If so, flow returns to step 120; if not flow continues to step 140. Theprimary purpose of the loop comprising steps 120 to 135 is to achieveself-consistency between the transformed displacements or absoluteaddresses and the actual locations of each compressed cache line. Oneskilled in the art could find numerous minor modifications in thecontrol structure of this loop to achieve the same objective. It islargely immaterial whether the actual compression of the instructions isdone once in this loop or after this loop has been completed, since forthe purpose of achieving self-consistency only the size of thecompressed data is needed, not the actual compressed values. The actualcompressed values are only needed once self-consistency has beenachieved. At step 140, each compressed line block is compacted into afinal memory.

[0020] In some implementations, it might be that the transformeddisplacement would be too large for an instruction format and it mightbe necessary to alter the original code by replacing a single flowindirection by a pair of flow indirection instructions. In this case, anadditional possible control flow back to Step 101 would be required.Provided that these code augmentations only increased code size, themonotonic principle would still apply and convergence would be obtainedafter a finite number of steps.

[0021]FIG. 2 illustrates, in a block diagram, the effects of each stepof method 100 on uncompressed unmodified code 20. In one embodiment ofthe invention, uncompressed unmodified code 20 represents compiledlinked code ready to be run in an uncompressed format on a dataprocessor. During step 110, of method 100, the uncompressed unmodifiedcode 20 would be divided into uncompressed cache line blocks as shown inthe divided uncompressed code 30. For a given CPU architecture using afixed instruction size, each cache line block will contain a fixednumber of instructions represented. The number of instructions in agiven cache line block will be dependent upon the cache line size of thedata processing system. In one embodiment of the present invention,there is a one-to-one correspondence between the number of instructionscapable of being contained in each cache line block and the numberactually stored. In another embodiment there will be fewer instructionsstored in each cache line block than can be accommodated by the cacheline of the data processor system. For example, if a data processingsystem was capable of holding 16 instruction words on a single cacheline, the cache line block could contain 16 or fewer words. Theembodiment of a system containing fewer than the maximum number of wordswill be discussed later.

[0022] Once divided into blocks, the individual instructions can bereferenced by a portion of the address known as an address tag, whichidentifies the beginning of a specific cache line block, and an offsetrepresenting an offset into the specific block. At step 120, the addressis replaced with an address tag and offset, and a size designatorindicating the number of bits reserved for containing the compressedinformation. During the initial pass through step 120, an estimatednumber of bits is used, subsequent passes will determine the number ofbits based upon the compressed information until all destinations aresuccessfully written. For example, initially the address location ADDRxis referenced as ADDR+3.2 (7). This indicates that location at ADDRx isin the fourth cache block at the third cache location, and that incompressed form, it is expected to be stored in seven bits. Note, thenumber of needed bits may be stored in a separate memory location. Forexample, the flow indirection instruction JMP ADDRx is also referencedby JMP ADDR+3.2 as seen in the uncompressed divided code 30. Duringnormal execution, the jump instruction will cause the instructionpointer to be loaded with the value ADDRx that contains the address tagin a portion of the most significant bits, and the offset in theremaining least significant bits. As a result, the instruction pointerwill point to the instruction residing at instruction location ADDRx,which in this case contains the instruction LD Z. In future iterationsof step 120, a compressed address location will replace the address tag.The addresses of the uncompressed unmodified code 20 can be regarded asphysical or logical addresses, where the code starts at an address ADDR0and is contiguous through the end of file. After the transformation ofthe addresses has converged, the compressed code 40 provides acompressed representation of the decompressed code 45. The transformedjump instruction, which will be the instruction generated bydecompression now will be JMP CADDR3.2, where the CADDR3 component ofthe address is the address of the first byte of the compressed code forthe cache line with the target instruction LD Z and the second componentof the address is the offset of the instruction within that cache lineafter decompression.

[0023] At step 130, efficiency is realized by compressing the individualcache line blocks to create compressed code 40, and the compresseddestination is stored for each indirection instruction if a sufficientnumber of bits has been allocated. Next at step 135, flow proceeds tostep 120 if all of the addresses have not converged, otherwise flowproceeds to step 140. At step 140, the code is written into memory asrepresented by compressed data 40. Decompressed code 45 represents thedecompressed data 40 as used by the data processor system. Note, theaddress space of the decompressed code 45 is not contiguous.

[0024]FIG. 3 represents a data processor system 320 having a computerprocessor 322. In the system 320, the compressed modified code 40 iscontained in memory 324. When a jump instruction is encountered in theinstruction flow, the computer processor 322 will determine that theaddress tag (for example CADDR3 for the JMP CADDR3.2 instuction of FIG.2) associated with the jump address is not currently in the cache.Therefore, a cache miss signal will be generated and sent to the cacherefill engine along with the address tag. The cache refill engine willuse the address tag which was provided by the jump instruction anddirectly access that location (for example CADDR3) within the compressedmodified code 40. Directly addressing means that no translation needs totake place between the cache address tag as provided by the computerprocessor 322, and the actual step of addressing the data referenced bythat tag in compressed memory. Therefore, by modifying the uncompressedunmodified code 20 (FIG. 2) to contain the address of the compressedmodified code 40, the need for look-aside tables and cacheabletranslation buffers is eliminated.

[0025] The method 100 works well for flow indirection instructions.However, when straight line code is encountered a fall through situationcan occur. A fall through situation occurs when a program flow advancesfrom the last instruction of a cache line to the first instruction ofthe next cache line as a result of the instruction pointer beingincremented. This is a normal situation that occurs when sequential codecrosses a cache line boundary. In prior art systems, the new cache tagwould be generated by incrementing the old tag by one, causing a newaddress tag to occur. In a prior art data processor system a cachedaddress translation occurs through the use of either the CLB or actuallyperforming a table search in the LAT. The look-up functions identify theappropriate location in compressed memory to retrieve the cache linethat contains desired information

[0026] In the present invention, an incremented cache tag has verylittle meaning, since the tag is used to access the compressed memorydirectly. Therefore, an incremented address tag would access the nextsequential memory location in the compressed modified code 40. Referringto FIG. 2, if compressed modified code address CADDR2 represented thecurrent address tag, and the address tag were incremented by one, thelocation CADDR2+1 would reside within the compressed cache line blockbeginning at address CADDR2, instead of at the desired location CADDR3.

[0027]FIG. 4 illustrates a decompression flow to address the fallthrough situation. At step 401, any required pre fall-through steps areperformed. At step 410, compressed cache line block is decompressed andthe size of the compressed block is determined. Next, at a step 420, ajump instruction is generated to redirect flow to the address of thefirst word of the next compressed cache line block. In order for thisflow to function properly, it is necessary for each cache line block tocontain at least one less instruction that the maximum cache line sizewould allow less 1. For example, if the data processor system has amaximum cache line size of 16 words, where each word contains oneinstruction, during the step 110 of FIG. 1 the modified uncompressedcode would be divided into blocks containing 15 words. This would leavespace for the decompression routine to store a jump instructionreferencing the next instruction. This jump location will redirect flowto the appropriate location within the compressed code, instead ofallowing a fall through situation with an incremented address tag. Note,it is likely that many address tags will contain no executable code.This scheme assumes that available address space exists to allow forthese unused address tags.

[0028] In the embodiment discuss above, it is seen that efficiency isgained at run time by eliminating the need for LATs and CLBs. This isaccomplished by applying pre-execution compression after compiling andlinking of the source code. This embodiment requires no modifications tothe computer processor 322. as a result, the computer processor 322 doesnot need to support memory management functions of look-aside buffersfor the purposes of compressed data, nor does the memory 324 need tocontain look-aside tables.

[0029]FIG. 5 illustrates in block diagram form a cache line 500 whichmay be used in a second embodiment of the invention allowing all cacheof the cache block to be used for user instructions. The cache line 500has a tag and cache words CW0-CWN. In addition, an offset field 510 isassociated with the cache line 500. This offset field is used toidentify the offset from the beginning of the current cache line incompressed memory to the start of the next compressed cache line incompressed memory. Since the compressed address is accessed directly bythe tag of a given cache line, the appropriate tag for the next cacheline can be obtained by adding the tag of the current cache line to theoffset 510 representing the size of the current cache line in compressedmemory. In order to use an offset scheme as described above, it isnecessary for the CPU 22 to recognize when the instruction pointer hasbeen incremented across a cache line such that a new tag value can begenerated.

[0030]FIG. 6 illustrates a method of determining when a new tag valueneeds to be generated. The method 600 is used each time an instructionpointer is incremented. At step 610, it is determined if the word offsetinto the cache line is equal to 0. A word offset of zero can be obtainedone of two ways. First, by a jump or branch instruction specifying adestination which is contained in at the first word within a cache line.As discussed previously, when a jump or branch instruction is used withcurrent embodiments, the specified tag as a result of a branch or jumpwill be correct as defined, and no corrective action will be needed. Thesecond way a word offsets of zero is obtained is when a fall throughsituation occurs between cache lines. For example, for a tag value of$20 ($ designates hexadecimal numbers) and a word offset value of $F,where the cache line holds $F words, the next time the instructionpointer is incremented the offset will go from $F to $0 and the cachetag will be incremented to $21. Again, as discussed previously, the newcache line $21 does not represent a valid location in compressed memorywhere the next desired cache line begins. Applying this example to FIG.6 step 610, if the word offset is $0 flow proceeds to step 620. At step620 it is determined whether the previous instruction was a branch or ajump instruction, whose indirection was taken. If the previousinstruction was a branch or jump instruction, and caused an indirection,the cache tag is correct and flow proceeds to step 640 allowing normaloperation. However, in that the previous instruction did not cause anindirection a fall through situation has occurred, and flow proceeds tostep 630 where a new tag needs to be calculated to identify the nextcache line in compressed memory. The new tag is calculated by taking thecurrent address tag, having a word offset of 0, and subtracting 1, thisvalue represents the previous address tag. To this value, the offset ofthe previous tag, as stored in cache line 500, needs to be added. Normalprocessor flow may now continue at step 640, as the correct tag has beencalculated. It would be obvious to one skilled in the art that thisoffset field may actually be built into the cache memory structure, orit could be contained in any memory location as long as the informationis maintained for each active cache line. The offset of cache line 500is illustrated in FIG. 5 as an extension of the cache line itself.

[0031] At step 610, it is necessary to determine when the word offset isequal to 0. This can be accomplished in a number of ways in eitherhardware or software. A hardware implementation, which will be discussedwith reference to FIGS. 8 and 9, requires generating a signal from theinstruction sequencer when the offset value were 0. This information,along with other cache information, would be used by the decompressionprogram to calculate the new tag and access the appropriate informationin compressed memory.

[0032]FIG. 7 illustrates in block diagram form a flow 700 which can fillthe cache line 500. Steps 701 and 710 are identical to steps 401 and 410of FIG. 4 and will not be discussed further. Step 720 of thedecompression method 700 calculates the compressed cache line size fromthe beginning to the end of the cache line block being decompressed.Next, at step 799, post decompression occurs. This would includeforwarding the information decompressed as well as the offsetinformation to the cache line 500 or appropriate memory locations.

[0033]FIG. 8 illustrates, in block diagram form, an implementation ofthe computer processor 22 (FIG. 3), and an instruction memory 858 forimplementing a hardware version of step 610 (FIG. 6). In one embodimentof the invention, the computer processor 22 comprises a CPU 850, aninstruction cache 852, and a cache refill engine 856. In a differentembodiment, the instruction memory 858 could be part of the computerprocessor. Likewise, the cache refill engine 856, or the instructioncache 852, could reside outside of the computer processor 22. The CPU850 is coupled to the instruction cache 852 and generates a fall throughsignal 860 which is coupled to the cache refill engine 856. Theinstruction cache 852 is coupled to the cache refill engine 856. Thecache refill engine 856 is coupled to the instruction memory 858.

[0034] The CPU 850 generates a fall through signal 860 which is receivedby the cache refill engine 856. The fall through signal 860 notifies thecache refill engine 856 that a fall through situation has occurred asdiscussed above. FIG. 9 illustrates, in more detail, the generation ofthe fall through signal 860. In FIG. 9, the CPU 850 is shown having aninstruction pointer 902 which is coupled to the execution unit 904. Theexecution unit 904 having a fall through detection stage 906. Theinstruction pointer 902 generates the current instruction address, thisaddress has an address tag component 908 and an offset component 910.The address tag component 908 is compared to instruction cache tags todetermine if a needed instruction currently resides in the instructioncache 852. Once a successful comparison has occurred, the offset 910 isused to determine which instruction contained in the matching cache linehas the current instruction. An offset of 0 indicates that the firstinstruction in a given instruction cache is being addressed. The fallthrough detection stage generates the fall through signal 860 bymonitoring the offset 910, and generating an active fall through signalwhen the offset 910 is equal to zero.

[0035] The cache refill engine 856 upon receiving the asserted fallthrough signal 860 determines whether the previously executedinstruction was a flow indirection instruction that took the indirectionbranch. If so, the current address tag is correct, as previouslydiscussed. If the previous instruction did not cause an indirection tooccur, then a fall through situation has occurred and a new tag needs tobe generated. The generation of the new address tag is performed by thecache refill engine, using the methods discussed previously, such ascalculating the new address tag based on the compressed size of theprevious cache line and its address tag.

[0036]FIG. 10 illustrates a flow 1000 in accordance with the presentinvention. Flow 1000 begins at a step 1002. At step 1002, a compressedline of code is identified directly by a token. This token has a cacheline offset which indicates where a compressed cache line begins, and aword offset which defines which instruction in the cache line is to beaccessed. Next, at a step 1004, the compressed cache line is requestedto be transmitted from a memory location by transmitting the token.Next, at step 1006, a cache tag is set to equal the cache line offsetvalue represented by the token. Next, at step 1008, the compressed lineof code is decompressed. Next, at step 1010, the decompressed code isstored in a cache line.

[0037] It is understood that there are many alternative embodiments ofthe present invention that may be performed. For example, one suchembodiment would be to calculate the offset between the currentcompressed cache line and the next compressed cache line during thecompression routine 100, and storing the information somewhere withinthe compressed block of data. This would eliminate the need to calculatethe offset during the decompression step 720 of FIG. 7. In addition,many of these functions may be performed in either hardware or softwareand this specification does not address all of the possible embodiments

[0038] Another embodiment would be similar to the first embodiment,however, instead of storing a jump instruction at the last word locationof the cache line, the compression routine could store a second cacheline of data at an available cache having a tag equal to the currentcache tag incremented by one, the jump would be the only instructioncontained in this cache line would be stored at the 0 offset location,and would jump to the beginning of the next appropriate cache line. Adisadvantage of this embodiment is that an entire cache line would beused to contain a single instruction.

We claim:
 1. A method of accessing compressed code stored in acompressed format in a memory comprising: identifying a compressed lineof code by a token directly equivalent to an offset for the compressedline of code stored in the compressed format in the memory; requestingthe compressed line of code be transmitted from the memory bytransmitting the token; setting a cache tag for the compressed line ofcode to the token.
 2. The method in step 1 wherein a location in thecompressed code is identified by a tuple comprising: the tokenidentifying the compressed line of code containing the location; and anoffset of the location in an uncompressed translation of the compressedline of code containing the location.
 3. The method in claim 2 whichfurther comprises: modifying an indirection instruction to utilize thetuple to identify a jump destination.
 4. The method in claim 2 whichfurther comprises: decompressing the compressed line of code into adecompressed line of code; and
 5. The method in claim 4 which furthercomprises: storing the decompressed line of code in a cache line bufferidentified by the cache tag.
 6. The method in claim 4 which furthercomprises: executing the decompressed line of code as computerinstructions.
 7. A method of compressing instruction data comprising thesteps of: modifying a flow indirection instruction in the instructiondata to have a modified destination which references a compressedaddress location, thereby creating a modified instruction data;partitioning the modified instruction data into a plurality of datablocks, each of the plurality of data blocks having a fixed data blocksize; compressing each of the plurality of data blocks, wherein each ofthe plurality of data blocks is referred to as a compressed block, andeach compressed block contains a plurality of instruction data.
 8. Themethod of claim 7 further comprising the step of: storing eachcompressed block in a memory area to create a compressed memory.
 9. Themethod of claim 7 where in the step of partitioning the modifiedinstruction data includes the step of partitioning the instruction data.10. The method of claim 7 wherein the compressed address referencecomprises a first address reference representing a beginning address ofthe compressed block containing the instruction data destination of theflow indirection instruction, and a second address representing anoffset, wherein the offset indicates which ore of the plurality ofinstruction data in the compressed block represents the instruction datadestination.
 11. The method of claim 8 further including the step of:adjusting each modified destination, after the step of compressing, thatdoes not correlate to the beginning address of the compressed blockcontaining the instruction data destination, such that each modifieddestination does correlate to the beginning address of the compressedblock containing the instruction data destination.
 12. A method ofaccessing compressed code in a system comprising the steps of: receivinga beginning address, wherein the beginning address represents thebeginning of a compressed data block; decompressing the compressed datablock to create an uncompressed data block having a sequential set ofuncompressed data, the uncompressed data block having a beginning and anend; determining a beginning address of a next compressed data blockbased on the compressed data block, wherein the next compressed datablock is adjacent to the compressed data block; adding an addressdesignator to the end of the uncompressed data block, the addressdesignator identifying a beginning of the next compressed data block.13. The method of claim 12 wherein the step of determining a beginningaddress of the next compressed data block includes determining thebeginning address of the next compressed data block by monitoring a sizethe compressed data block during the step of decompressing.
 14. Themethod of claim 12 wherein the step of determining a beginning addressof the next compressed data block includes determining the beginningaddress of the next compressed data block based data contained in thecompressed data block.
 15. The method of claim 12 wherein the datacomprises instruction data for a data processor.
 16. The method of claim12 wherein the uncompressed data block is of a predetermined size. 17.The method of claim 12 wherein the uncompressed data block represents acache line of data.
 18. The method of claim 12 wherein the addressdesignator is a flow indirection instruction;
 19. The method of claim 18wherein the flow indirection instruction is a jump instruction, or abranch instruction.
 20. The method of claim 12 wherein the addressdesignator is an offset from the beginning address to a beginningaddress of the next compressed data block;
 21. The method of claim 12further comprising the step of: storing the uncompressed data block sothat it can be accessed by a central processing unit of a dataprocessing system.
 22. A method of accessing compressed code in a dataprocessor system, the data processor system having a central processingunit having an instruction cache, an instruction pointer, and a memoryarea, the system comprising: operating out of a cached memory area ofthe data processing system when the cached memory area has a cache linehaving a cache tag that matches a first address tag; generating a fallthrough signal when a second address tag is generated by incrementingthe instruction pointer, wherein the second address tag is not equal tothe first address tag, and a value of the first address tag can bedetermined by a value of the second address tag; adding a prestoredoffset value associated with the first cache tag to the value of thefirst address cache tag to produce a third address tag when there is anactive fall through signal, wherein the third address tag is used todirectly access a compressed code at an address location in the memoryarea.
 23. The system of claim 22, wherein the address locationrepresents the beginning of a compressed data block;
 24. The system ofclaim 22, wherein the fall through signal is generated only when aprevious instruction did not cause a flow indirection.
 25. A dataprocessing system for accessing compressed code having a centralprocessing unit, the central processing unit comprising: an instructioncache, the instruction cache having a plurality of cache lines, each ofthe plurality of cache lines having a cache tag, a plurality of cachewords, and a next line offset; an instruction pointer coupled to theinstruction cache for indicating an instruction to be accessed, whereinthe instruction pointer has a value comprising an address tag foraccessing one of the plurality of cache lines and a word offset foraccessing one of a plurality of cache words in the one of the pluralityof cache lines; a fall-through detection stage coupled to theinstruction pointer, the overflow detection stage asserts a fall-throughsignal when the offset is accessing a first word of the plurality ofcache words in the one of the plurality of cache lines; a cache fillstage coupled to the instruction cache, the instruction pointer, and thefall-through detection stage, the cache fill stage comprises adecompression unit for decompressing a compressed block of data, whereinthe cache fill stage will modify the address tag when the fall-throughsignal is asserted.
 26. The system of claim 25 wherein the fall-throughdetection stage comprises asserting a fall-through signal when theoffset is accessing a first word of the plurality of cache words in theone of the plurality of cache lines and a previous instruction did notcause a flow indirection;
 27. The system of claim 25, wherein the cachefill stage will modify the address tag by adding a size of a previouscompressed block of data to the cache tag of the previous compressedblock of data.
 28. A data processor system for accessing compressed datacomprising: a central processor unit for executing instructions; amemory coupled to the central processor unit for containing dataprocessor instructions for execution by the central processor unit, thedata processor instructions comprising: identifying a compressed line ofcode by a token directly equivalent to an offset for the compressed lineof code stored in the compressed format in the memory; requesting thecompressed line of code be transmitted from the memory by transmittingthe token; setting a cache tag for the compressed line of code to thetoken.
 29. A computer readable medium (storage medium) for storing adata compression routine comprising the steps of: modifying a flowindirection instruction in the instruction data to have a modifieddestination which references a compressed address location, therebycreating a modified instruction data; partitioning the modifiedinstruction data into a plurality of data blocks, each of the pluralityof data blocks having a fixed data block size; compressing each of theplurality of data blocks, wherein each of the plurality of data blocksis referred to as a compressed block, and each compressed block containsa plurality of instruction data.